Degraded Script Identification for Indian Language- A Survey
نویسندگان
چکیده
The working module of any Optical character Recognition system almost depends upon printing and paper of the input document image. A number of OCR techniques are available and claim correctly identified accuracy in printed document image in Indian and foreign script. A few report have been found on the recognition of the degraded Indian language document. The degradation in any scanned printed document can be of many types. In this paper, we focus a survey of degraded script identification for Indian Language document.
منابع مشابه
Script Identification for Document Image Retrieval: A Survey
In recent years there are many multimedia documents captured and stored with the advances in computer technology and hence the demand for recognizing and retrieval of such documents has increased tremendously .In such environment the large volume of data and variety of scripts make manual identification unworkable. In such cases the ability to automatically determine the script ,and further the...
متن کاملAmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text
The progression of social media contents, similar like Twitter and Facebook messages and blog post, has created, many new opportunities for language technology. The user generated contents such as tweets and blogs in most of the languages are written using Roman script due to distinct social culture and technology. Some of them using own language script and mixed script. The primary challenges ...
متن کاملScript Identification from Bilingual Gujarati-English Documents
In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...
متن کاملLabeling of Query Words using Conditional Random Field
This paper describes our approach on Query Word Labeling as an attempt in the shared task on Mixed Script Information Retrieval at Forum for Information Retrieval Evaluation (FIRE) 2015. The query is written in Roman script and the words were in English or transliterated from Indian regional languages. A total of eight Indian languages were present in addition to English. We also identified the...
متن کاملHindi-English Language Identification, Named Entity Recognition and Back Transliteration: Shared Task System Description
This paper presents an algorithm for word level language identification, named entity recognition and classification, and transliteration of Indian language words written in the Roman script to their native Devanagari script from bilingual textual data. We propose the construction of an extensive, hierarchical structured dictionary and hierarchical rule-based classifier to expedite word search ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014